Mining Outliers with Faster Cutoff Update and Space Utilization
نویسندگان
چکیده
It is desirable to find unusual data objects by Ramaswamy et al’s distance-based outlier definition because only a metric distance function between two objects is required. It does not need any neighborhood distance threshold required by many existing algorithms based on the definition of Knorr and Ng. Bay and Schwabacher proposed an efficient algorithm ORCA, which can give near linear time performance, for this task. To further reduce the running time, we propose in this paper two algorithms RC and RS using the following two techniques respectively: (i) faster cutoff update, and (ii) space utilization after pruning. We tested RC, RS and RCS (a hybrid approach combining both RC and RS) on several large and high-dimensional real data sets with millions of objects. The experiments show that the speed of RCS is as fast as 1.4 to 2.3 times that of ORCA, and the improvement of RCS is relatively insensitive to the increase in the data size.
منابع مشابه
Testing the Exactitude of Estimation Methods in the Presence of Outliers: An accounting for Robust Kriging
Estimation of gold reserves and resources has been of interest to mining engineers and geologists for ages. The existence of outlier values shows the economic part of the deposits subject to the fact that don’t depend on the human or technical errors. The presence of these high values causes a pseudo dramatically increment in variance estimation of economical blocks when applying conventional m...
متن کاملCutoff grades optimization with environmental management A case study: Sungun Copper Project
Cutoff grade is a grade used to assign a destination label to a parcel of material. The optimal cutoff grades depend on all the salient technological features of mining, such as the capacity of extraction and of milling, the geometry and geology of the orebody, and the optimal grade of concentrate to send to the smelter. The main objective of each optimization of mining operation is to maximize...
متن کاملOutlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملLocal multivariate outliers as geochemical anomaly halos indicators, a case study: Hamich area, Southern Khorasan, Iran
Anomaly recognition has always been a prominent subject in preliminary geochemical explorations. Among the regional geochemical data processing, there are a range of statistical and data mining techniques as well as different mapping methods, which serve as presentations of the outputs. The outlier’s values are of interest in the investigations where data are gathered under controlled condition...
متن کاملEfficient Maintenance and Update of Nonbonded Lists in Macromolecular Simulations
Molecular mechanics and dynamics simulations use distance based cutoff approximations for faster computation of pairwise van der Waals and electrostatic energy terms. These approximations traditionally use a precalculated and periodically updated list of interacting atom pairs, known as the "nonbonded neighborhood lists" or nblists, in order to reduce the overhead of finding atom pairs that are...
متن کامل